Exploring Under-appreciated Rewards

نویسندگان

Ofir Nachum

Mohammad Norouzi

Dale Schuurmans

چکیده

This paper presents a novel form of policy gradient for model-free reinforcement learning (RL) with improved exploration properties. Current policy-based methods use entropy regularization to encourage undirected exploration of the reward landscape, which is ineffective in high dimensional spaces with sparse rewards. We propose a more directed exploration strategy that promotes exploration of under-appreciated reward regions. An action sequence is considered under-appreciated if its log-probability under the current policy under-estimates its resulting reward. The proposed exploration strategy is easy to implement, requiring small modifications to the REINFORCE algorithm. We evaluate the approach on a set of algorithmic tasks that have long challenged RL methods. Our approach reduces hyper-parameter sensitivity and demonstrates significant improvements over baseline methods. The proposed algorithm successfully solves a benchmark multi-digit addition task and generalizes to long sequences, which, to our knowledge, is the first time that a pure RL method has solved addition using only reward feedback.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Policy Gradient by Exploring Under-appreciated Rewards

متن کامل

Rationality in Human Movement.

It long has been appreciated that humans behave irrationally in economic decisions under risk: they fail to objectively consider uncertainty, costs, and rewards and instead exhibit risk-seeking or risk-averse behavior. We hypothesize that poor estimates of motor variability (influenced by motor task) and distorted probability weighting (influenced by relevant emotional processes) contribute to ...

متن کامل

Efficacy of pins and diplomas as a reward for long-term smoking cessation.

AIMS AND BACKGROUND Since 2004, the Antismoking Center of the National Cancer Institute of Milan has rewarded those who have been ex-smokers for longer than a year with a "former smoker" pin and a diploma. We investigated firstly whether these rewards contributed to maintain smoking withdrawal, secondly, which one of these was more appreciated and why, and thirdly, how they may have influenced ...

متن کامل

Exploring Effects of Intrinsic Motivation in Reinforcement Learning Agents

We explore Intrinsic Motivation as a reward framework for learning how to perform complicated tasks. Most reinforcement learning tasks assume the existence of a critic who rewards the agent for its actions. However, taking inspiration for biological agents, we can say that the real critic is the agent itself. We experiment with a model where the rewards are generated by the agent using a proces...

متن کامل

Exploring EFL Learners' Beliefs toward Communicative Language Teaching: A Case Study of Iranian EFL Learners

Although Communicative Language Teaching (CLT) has been widely advocated by a considerable number of applied linguists and English language teachers, its implementation in English as a Foreign Language (EFL) contexts has encountered a number of difficulties. Reviewing the literature suggests that one of the reasons for unsuccessful implementation of CLT may be neglect of learners' beliefs in t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Exploring Under-appreciated Rewards

نویسندگان

چکیده

منابع مشابه

Improving Policy Gradient by Exploring Under-appreciated Rewards

Rationality in Human Movement.

Efficacy of pins and diplomas as a reward for long-term smoking cessation.

Exploring Effects of Intrinsic Motivation in Reinforcement Learning Agents

Exploring EFL Learners' Beliefs toward Communicative Language Teaching: A Case Study of Iranian EFL Learners

عنوان ژورنال:

اشتراک گذاری